Robust registration of SAR and optical images based on deep learning and improved Harris algorithm

Traditional algorithms can achieve good results when registering homologous images, but it cannot reach satisfying results for registration between synthetic aperture radar (SAR) and optical images. The difficulty is that the image texture information and structures of different modalities is very different which leads to poor registration results. To solve this problem, we present a robust matching framework for registration between SAR and optical images. First, a novel deep learning network is utilized to generate high quality pseudo-optical images from SAR images. Next, feature points are detected and extracted using the multi-scale Harris algorithm. Then the feature points are constructed through the gradient position orientation histogram method. Finally, the actual position of the feature points will be reconstructed through a feedback mechanism for matching. Experimental results demonstrate its superior matching performance with respect to the state-of-the-art methods.


Methodology
Proposed network for SAR to optical image translation. In this section, we provide details of the proposed deep learning framework for generating pseudo-optical images from SAR images. The network consists of two main components: colorization network and generative adversarial learning. In the colorization network, we introduce an adversarial loss for better image colorization.
Deep learning-based image colorization has been studied over the last couple of years 11,12 . Fully leverage the contextual information of an image is the key step during an image colorization neural network for color translation. Generally, an encoder-decoder architecture is added for extracting and utilizing the contextual information. The input image is encoded into a set of feature maps in the middle of the network. But this means that all information flows need pass through all the layers during such a network. Considering the image colorization problem, the sharing of low-level information between the input and output is important since the input and output should share the location of prominent edges. For the above reasons, we add skip connections which is following the general shape of an encoder-decoder CNN as shown in Fig. 1. The colorization sub-network forms a symmetric encoder-decoder with 8 convolution layers and 3 skip connections. For each convolution layer, the kernel size is 3 × 3.
As for the translation of SAR images, one important part is that the output image must be noise free and realistic 13 . One common loss function used in many image translation problems is the L 1 loss. Although the L 1 loss has been shown to be very effective for image de-noising problem, it will incentivize an average, grayish color if it is uncertain which of several plausible color values a pixel should take on. In particular, L 1 will be minimized by choosing the median of the conditional probability density function over possible colors. Thus, the L 1 loss alone is not suitable for image colorization. Recent studies have shown that the adversarial loss can become aware that gray looking outputs are unrealistic, and encourage matching the true color distribution. Considering the pros and cons of both losses, we combine the per-pixel L 1 loss and the adversarial loss together with appropriate weights to form our new refined loss function.
Perform gradient calculation and feature point extraction on the image. The image gradient must be calculated before the feature point extraction. The edge detection Sobel operator can quickly calculate the direction convolution kernel which is required for key point detection of the subsequent Harris algorithm 14 . First define two templates in the horizontal and vertical direction as: Use the two templates in Eq. (1) to convolve with the image gray value I(x, y) to get the gradient values in the horizontal and vertical directions. Taking into account the scale invariance, the scale parameter α i is introduced, and f H and f V .
are regarded as the volume of two rectangular sub-windows and Gaussian kernel functions. The multi-scale Sobel operator used can be expressed as: In the formula, F H, α i , F V , and α i are the gradients in the horizontal and vertical directions respectively, Gα i is the Gaussian kernel function corresponding to α i , and * represents the convolution operation. The scales in the optical image and the SAR image correspond to each other, satisfying; Therefore, the gradient size and direction can be expressed as: In the formula, F M, α i is the gradient magnitude matrix of the image, and F O, α i is the gradient direction matrix.
(1) www.nature.com/scientificreports/ When the original SIFT algorithm detects the key points of SAR images, the multiplicative speckle noise will have a serious impact on the second derivative used which results in that reliable key points cannot be detected 15 . Therefore, the key point detection method is improved during the SIFT algorithm. Experiments show that the multi-scale Harris detection method can detect key points with higher repeatability and stronger stability, which is better and much faster than the minimum nuclear similarity zone (SUSAN) isocenter detection. Based on the gradient calculation, multi-scale Harris function is used to construct the scale space. The candidate key points of each layer are extracted by calculating the local maximum value, and non-maximum value suppression is performed 16 . The multi-scale Harris function can be expressed as: where: α i is the scale of the image, G H, α i , G V , and α i are the horizontal and vertical gradients on the scale α i respectively, d is any parameter, det is the value of the matrix determinant, tr is the trace of the matrix, and R is the scale space.
Construct descriptors and perform feature matching. After feature point detection, the GLOH 17 method is used to establish the descriptor. This descriptor can improve the processing speed of the algorithm while retaining more structural information of the image. It solves the problem of inconsistencies in the main directions of heterogeneous images which is caused by the traditional descriptor creation method, and making the final registration result more stable. At the same time, the nearest neighbor distance ratio (NNDR) 18 method is used to measure the similarity between descriptors and the FSC (Fast sample consensus) algorithm 19 is used to delete the wrong matching point pairs.
Reconstruct feature points of the original image. Considering the problem of image de-redundancy will cause the lack of image pixels and the output image quality is changed when the de-redundant image is directly used for registration, we propose the feature point reconstruction method to make the final registration order and the target of the segment is the original input image 20 . The core idea of feature point reconstruction is that the descriptor is used after de-redundancy to restore the coordinate information in the original image, then compare the deleted elements and coordinate information in set Ω and Ω′ recorded during the de-redundancy process, and calculate the total number of rows and columns removed before the current coordinates. The coordinates of the corresponding points in the original image are the sum of horizontal and vertical coordinates of the feature points in the redundant image, and the number of rows and columns are removed. The process of feature point reconstruction algorithm: 1. Enter the description of the redundant image P = {p 1 , p 2 ,…, p x }, extract the descriptors of the visible light image and the SAR image; 2. Compare the coordinate information in Θ and Θ′ with the row number and column number recorded in Ω and Ω′ in turn. Take Θ and Ω as an example, the comparison method: arrange all the i nums in Ω in ascending order, and use p i,1 in Θ for interpolation sorting. The size of p i,1 is the number of rows i row that were removed before that point. The number of columns that were eliminated before the point i col . 3. Repeat step 2) for other descriptors to obtain the coordinates in the original image. Taking the ith feature point as an example, the coordinates in the original image are: 4. Obtain the position information of feature points of the original image, and perform the parameter estimation of the affine transformation model based on these feature points, then the model is finally to complete the image registration correction.

Experimental results and analysis
To evaluate the performance of the proposed method, three pairs of SAR and optical images are experimented. The experiments are compiled with Python3.6, and the network is built through the deep learning framework of Pytorch1.3, and the corresponding CUDA10.0 and cudnn7.0 are configured for GPU acceleration. The test data consists of different characteristics including different resolutions, incidence angles, seasons etc. The dataset description is shown in Table 1. Experimental results are shown in Figs. 2, 3, 4 and Table 2.
To quantitatively evaluate the registration performances, we adopt the root-mean-square error (RMSE) 21 between the corresponding matching keypoints, and it can be expressed as where (x i , y i ) and ( x ′ i , y ′ i ) are the coordinates of the ith matching keypoint pair; n means the total number of matching points. In addition, correct matching ratio (CMR) is another effective measure which is defined as:   www.nature.com/scientificreports/ "correspondences" is the number of matches after using PROSAC, "correctMatches" is the number of correct matches after removing false ones. The results of quantitative evaluation for each method are listed in Table 2.
It can be seen from Table 2 that the SIFT algorithm fails to match in heterogeneous image registration, and the correct matching rate obtained by the SIFT-M 19 and PSO-SIFT 20 algorithms is relatively low, and the PSO-SIFT algorithm runs relatively fast. After a certain rule of de-redundancy of the image, the number of feature point pairs for registration can be greatly reduced. The original image reconstruction of the feature point pairs before the affine transformation model estimation can ensure the accuracy of heterogeneous image registration. Therefore, the proposed algorithm reduces greatly the running time as well as improves the efficiency of SAR and optical image registration.

Conclusion
In this paper, we present a robust matching framework for registration between SAR and optical images. First, a novel deep learning network is utilized to generate high quality pseudo-optical images from SAR images. Next, feature points are detected and extracted using the multi-scale Harris algorithm. Then the feature points are constructed through the GLOH method. Finally, the actual position of the feature points will be reconstructed through a feedback mechanism for matching. Experimental results demonstrate its superior matching performance with respect to the state-of-the-art methods. Future work will mainly comprise a CNN-based framework for learning to identify corresponding patches in SAR and optical images in a fully automatic manner.